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ABSTRACT 


Methods  of  representing  Korean  and  Chinese  characters 
are  presented,  using  a  limited  number  of  keystrokes  on  a 
standard  keyboard.  Various  attempts  have  been  made  to  find 
the  most  efficient  way  to  represent  these  characters  such  as 
enumeration  methods,  16-bit  coding  for  Korean  character 
syllables,  and  the  meaning  and  the  sound  method  for  Chinese 
characters.  Details  of  these  are  explained  with  a  brief 
introduction  to  some  general  properties  of  Korean  and 
Chinese  characters  currently  used  in  Korea. 
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I.  IHTBODOCTION 


The  development  cf  computer  and  information  processing 
has  cent  to  the  stage  of  being  able  to  handle  Korean  and 
Chinese  character  input  and  output.  There  is  no  problem  in 
information  systems  for  the  input  and  output  of  characters 
from  a  standard  Roman  character  keyboard,  but  the  problems 
related  tc  non-Roman  characters  from  I/O  to  software  prob¬ 
lems  of  language  handling  remain  almost  unsolved.  Until 
recently  the  computer  could  not  handle  Korean  or  Chinese 
characters  efficiently.  It  was  not  user  friendly  and  data 
processing  in  Korea  was  imperfect  and  very  unwieldy.  Among 
the  problems,  the  biggest  issue  is  how  to  enter  2,369  Korean 
and  1,800  common  Chinese  characters  from  the  standard  Pocan 
character  keyboard. 

During  the  last  few  years,  there  have  been  great  efforts 
at  universities,  research  institutes  and  manufacturers  for 
the  development  of  good  I/O  devices  for  Korean  characters. 
In  Korea,  natural  language  processing,  especially  Korean 
language  processing,  is  one  of  the  essential  elements  for 
the  future  of  computer  and  information  systems. 

I irst  the  properties  of  Korean  and  Chinese  characters 
will  be  presented  as  an  introduction  for  those  unfamiliar 
with  these  characters.  Then,  the  resolution  power  of  CRT’s 
and  dot  matrix  printers  and  their  relation  to  the  shape 
characteristics  (readability,  asthetic  quality,  etc.)  of 
Korean  and  Chinese  characters  will  be  discussed.  The 
methods  which  are  developed  for  Korean  and  Chinese  character 
I/O  can  be  applied  to  other  character  sets,  especially  to 
many  ncn-Roman  alphabetic  character  sets,  not  to  mention 
Chinese  characters  in  China. 


II.  background 


A.  PROPERTIES  C»  KOBEAN  DOCUMENTS 

Common  documents  in  Korea  are  usually  written  in  a  mixed 
form  utilizing  Korean  and  Chinese  characters.  Minor  use  is 
made  cf  Reman  script.  The  usage  of  each  character  set 
depends  on  the  kind  cf  document.  In  order  to  perform  word 
processing  efficiently  in  Korea,  the  simultaneous  editing  of 
these  characters  is  essential.  Table  I  shows  the  use  of 
characters  found  for  various  types  of  documents.  This  data 
is  based  on  sampling  performed  expressly  for  this  study.  The 
following  sources  were  in  the  sampling  process  to  construct 
Table  I: 

newspaper  -  Korean  Daily  Times,  "3A  Era",  16 
September  1984 

journal  -  "National  Security",  June  1984 
technical  papers  (A)  -  "COBOL  Programming",  Eong-A 

publishing  Co.,  1978 

technical  papers  (B)  -  "Introduction  to  Law",  Beot 

.Icon  Sa  publishing  Co.,  1978 
business  papers  -  Korean  Air  Lines  Co. 

Although  the  sample  was  taken  from  a  single  source  for  each 
kind  of  document,  it  is  the  authors’  view  that  the  documents 
selected  are  representative  of  the  entire  population  of  each 
type. 


B.  CHARACTERISTICS  CE  KOREAN  SCRIPT 


The  native  Korean  alphabet  was  introduced  in  1446,  after 
centuries  of  the  use  of  a  more  cumbersome  method  (known  as 
IDU)  to  transcribe  Korean  with  Chinese  characters.  The  set 


TABLE  I 

Proportions  of  Written  Characters 


News¬ 

paper 

Journal 

1  Technical 

1  (A)  | 

pir 

|  Business 
|  paper 

Roman 

script 

1% 

1  3% 

i  40%  | 

0% 

|  10% 

Korean 

script 

00 

-t 

A 

I  76% 

1  c  _  _  1 

I  55%  | 

5  5% 

|  80% 

Chinese 

character 

1  5% 

21% 

1  5%  | 

45% 

i  10% 

:  15! 

:  Technical  naper 
:  Traditional  and 

s  from  western  countries 
historical  papers 

of  28  letters1  (now  24  letters)  was  designed  by  a  group  of 
scholars  commissioned  by  King  Sejong  (1419  -  1450),  the 

fourth  King  of  the  Yi  dynasty. 

The  Korean  language  and  alphabet  is  spoken  and  written 
by  an  estimated  50  million  people  on  the  Korean  peninsula 
and  its  coastal  islands.  Many  among  the  approximately  one 
million  Koreans  residing  in  Japan,  China,  and  America  still 
speak  and  write  the  language  [Ref.  9]. 

The  Korean  alphabet  currently  used  consists  of  14  conso¬ 
nants  (_p_  _u_  c  2  □  d  A  o  _X.JL_3L.e_JL  J5J  and  10 

vowels  (_£_  Jl  _i_  ±  i  Jt  _I  I  -  _j_)  .  There  are  also  17 

compound  consonants  (an  JA  la  Li  cc  at  20  za  ££  21  So  ot 

falB  dA  M  73)  and  11  compound  vowels  ( H  H  -ij  xh  Mi  jl[  tA 

Si  tL  jrL)  •  The  letters  of  the  Korean  alphabet  cannot  be  used 
independently  but  are  used  to  build  syllables.  Each  Korean 
character  consists  of  two  or  three  parts.  The  first  part 


1 A  letter  is  an  element  of  a  character.  The  character 
consists  of  two  or  three  letters.  Letters  in  Korea  are  a  set 
of  14  consonants  and  10  vowels. 


must  fce  a  consonant  or  compound  consonant.  There  are  19 
letters  that  are  possible  for  the  first  part  of  the  Korean 
character.  They  are  typically  consonants  or  compound  conso¬ 
nants  (j_  n  l.  c.  c  c  a  o  bt  dBAMo  7-  xx  X  3  £  & 
_o_)  .  The  second  part  of  the  Korea  character  is  typically  a 
vowel  or  compound  vowel. There  are  21  possible  letters  for 
the  second  part  of  the  Korean  character  (_h_  _H_  _p_  _H_  A  llL 

i  JL  oL  jht  id  ±1  iL  JL  A  A  JL  A  J_)  •  The  third 
part  of  the  Korean  character  is  optional  and  depends  on  the 
character  being  depicted.  The  third  part  if  present,  must 
be  a  consonant  or  a  compound  consonant.  "‘here  are  23 
letters  possible  as  the  third  part  [J_  n  1A  u  Lt  LA  c  S 
27  EQ  Sd  EA  S£  Eo  J?_  07  WA  A_  /A  A  A  A  A  A 

El) .  This  section  has  been  summarized  in  Figure  2.1. 

The  Korean  system  of  writing  is  called  "Hang ul" .  It  is 
"phonetic"  writing,  like  English,  in  the  sense  that  the 
symbols  represent  sounds,  that  is,  consonants  and  vowels. 
Unlike  English  symbols,  which  are  grouped  directly  into 
words  (e.g.,  E+n+g+l+i+s+h  =  English),  Korean  symbols  are 
first  grouped  by  syllable  (e.g.,  H+a+n  g+u  +  1  =  Har.  gul) 
[Ref.  10]. 

Korean  symbols  are  written  in  syllabic  groupings.  An 
enumeration  method2  is  to  put  letters  side  by  side  as  in 
"LONTCN".  But  the  Korean  language  stacks  the  letters  in  most 
characters.  For  example,  "LONDON"  would  be  written  j§_  n  . 
The  simplest  syllable  is  written  with  one  consonant  and  one 
vowel.  When  one  writes  the  symbol  for  a  vowel  alone,  one 
must  add  the  consonant  symbol  "  O  ",  which  indictees  an 
initial  mute  (which  is  closed  as  a  consonant) .  In  this 
simple  consonant  and  vowel  syllable,  there  are  two  types  of 
arrangements;  side-by-side  arrangement  (e.g.,  7_h_)  and 


2In  an  enumeration  method  letters  are  olaced  side  by 
side  or  element  by  element  using  a  set  of  ‘consonants  ana 
vowels. 


2 


THE  CHARACTERISTICS  OF  KOREAN  CHARACTER 

THE  KCREiV)  ALPHABET  CONSISTS  OF  24  BASIC  LETTERS*  ELEMENTS ); 

14  CONSONANTS:  1LC30dAHX*1cB8 
10  VOWELS  :fMUUTi.l 
EACH  CONSONANT  AND  VOWEL  CAN  BE  COMPOUNDED 
.  POSSIBLE  COMPOUND  CONSONANTS 

11  1A  LX  L8  CC  31  30  30  2A  3c  3D  38  01  dB  B*  AA  XX 

.  POSSIBLE  COMPCUND  V DUELS 
h  ti  it  it  it-  im  it  t i  m  t i  -t 

EACH  CHARACTER  CAN  BE  DIVIDED  INTO  THREE  PARTS  <  FIRST 
SOUND, MIDDLE  SOUND, FINAL  SOUND)  OR  TWO  PARTS  (FIRST  AND 
SECOND  SOUND). 

.  THE  FIRST  PART  MUST  CONSIST  OF  A  CONSONANT  UR  A  COMPOUND  CONSONANT 
THE  SECOND  PART  MUST  CONSIST  OF  A  SINGLE  OR  A  COMPOUND  VOWEL 
THE  THIRD  PART  IS  OPTIONAL.  IF  USED,  IT  MUST  BE  A  CONSONANT. 

.  THE  FOLLOWING  LETTERS  CAN  BE  USED  AS  THE  FIRST  PART ! 
inLCCC2QddUAAAMXXX*1ea8 
19  LETTERS 

.  THE  FOLLOWING  LETTERS  CAN  BE  USED  AS  THE  SECOND  PART," 

)  H  UMI  111  UHH  II  U  T  T1H  Tl  n.  -I  I 
21  LETTERS 

.  THE  FOLLOWING  LETTERS  CAN  BE  USED  AS  THE  THIRD  PART; 

1  11  1A  L  LX  LS  C  3  31  30  30  3A  2c  311  38  Q  01  d  dA  A  AA  H  X  *  1 

c  n  8  ;  28  LETTERS 

».  NUMBER  OF  POSSIBLE  COMBINATIONS  OF  CHARACTER  =  19*23»29  =  11,571 
IN  PRACTICE,  ONLY  ABOUT  2, 400  CHARACTERS  ARE  USED- 


top- to-bottom  arrangement  (e.g.,  _-2._)  .  The  particular  vowel 
being  written  determines  which  arrangement  is  used. 

representing  these  character  syllables  through  a 
computer  creates  a  problem  because  each  letter's  (consonant 
and  vowel)  shape  can  be  different  due  to  a  requirement  that 
each  character  be  balanced,  i.e.,  have  the  same  size  and 
achieve  a  desired  asthetic  quality.  For  example,  when  _H_  is 
placed  to  the  left  cf  a  vowel,  the  downward  portion  is 

slanted:  7  (e.g.,  zr[-  )  .  When  it  is  placed  on  top  of  the 

vowel,  the  downward  portion  becomes  straight:  ~T  (e.g., 

-H  )  .  As  shown  above,  it  is  very  difficult  to  apply  these 
different  shapes  for  a  particular  letter  to  a  line  printer 
and  a  typewriter.  This  problem  will  be  discussed  in  detail 
in  the  following  chapter. 

By  mathematical  calculation,  the  possible  number  of 

Korean  characters  is  11,571  (  19  *  21  *  29  )  .  It  must  be 

noted  through  that  only  2,369  characters  are  commonly  used 

[Bef .  8:  p.  11 ]. 

C.  CHARACTERISTICS  OF  SINO-KORBAN  CHARACTERS 

Sino-Korean  characters  are  Chinese  characters  used  in 
Korea.  They  are  different  from  those  used  in  China.  Koreans 
refer  to  Chinese  characters  as  Hanja.  Chinese  characters 
have  a  long  history,  the  earliest  discovered  writings  having 
been  dated  from  about  14  3.C..  In  109  A.D.  during  the  Han 
Dynasty,  this  was  modified  by  Hsu  Sheng  30  -  124  )  in 

his  15  -  Volume  paleographical  work,  Shuo-wen  Chieh-tzu, 

)  which  translates  to  the  explanation  of  writing 
and  analysis  of  words.  That  work,  lists  9,353  characters 
under  540  radical  entries.  Of  this  number,  364  are  picto- 
graphic,  125  simple  idiographic,  1,167  compound  idiographic 
and  7,697  phonetic  compounds. 


The  most  complete  collection,  the  Kang  Hsi  Dictionary 
■with  about  50,000  characters  was  published  in  1715.  Since 
1949,  after  the  establishment  of  the  Peoples  Republic  of 
China,  the  Chinese  government  actively  pursued  language 
reform  until  the  Cultural  Revolution,  1966-1976.  The  Chinese 
government  changed  and  simplified  the  characters  from  the 
original  [Ref.  5:  p.  15]. 

The  number  of  characters  used  commonly  is  from  1,000  to 
3,003.  Table  II  [Ref.  1:  p.  819]  shows  the  frequency  of 


TABLE  II 

Frequency  of  Chinese  Characters  Osed  in  Documents 


News- 

General  } 

Total 

News- 

Genera 

1 

papers 

Document  | 

Document 

papers 

Docume 

nt 

W 

ro  1 

(%) 

(chrs) 

(chrs 

) 

10.0 

8.  8 

80 

49° 

638 

27.5 

25.  5 

85 

615 

777 

38.9 

36.  1 

90 

781 

OQ? 

55.4 

51.0 

95 

10  68 

1358 

79.0 

73.5 

96 

1156 

1479 

93.  1 

89.0 

97 

1269 

1617 

97.4 

9  5.0 

98 

1421 

1832 

98.7 

97.  6 

99 

1661 

2157 

98.9 

99.4 

100 

2878 

3323 

99.8 

1st  10  chrsl 
50 
100 

200  | 

500 
1000 
1500 

2000  I 

2500 

3000  i 


*  chrs;  acronym  of  characters 


Chinese  characters  used  in  typical  documents. 

In  1972,  the  Korean  ministry  of  Education  suggested  that 
1,800  Chinese  characters  be  learned  and  used  for  educational 
purposes  [Ref.  3].  In  this  study,  the  authors  will  restrict 
themselves  to  that  set  of  1,800  characters.  The  Chinese 
characters  are  called  Hantzu  in  Chinese,  Hanja  in  Korean, 
and  Kanji  in  Japanese.  All  mean  "Han  Characters"  (£  ^ 1 . 


These  characters  are  used  exclusively  in  Chinese  writings, 
and  in  combination  with  the  Hangul  (Korean)  alphabet  in 
Korea  and  with  the  Kana  Syllabaries  in  Japan.  The 
Sino-Forean  (Hanja) ,  in  written  form,  is  a  combination  of 
three  major  elements:  pictograms  and  ideograms,  and 
phonograms  [Ref.  5:  p.  22], 

In  the  next  chapter  the  perspective  of  a  picture  for 
each  character  will  be  used  because  of  both  the  complexity 
of  Chinese  chracters  and  the  ease  of  representation  in  the 
computer.  Each  Chinese  character  has  the  meaning  and  sound, 
for  example,  means  heaven  and  the  sound  is  cheon.  .Also, 
there  are  many  characters  which  have  different  meanings  hut 
the  same  sound,  or  the  same  meaning  but  different  sounds. 
In  order  to  solve  this  problem  there  are  several  methods. 
Appendix  A  [Ref.  5:  p.  17]  shows  the  evolution  of  Chinese 
characters. 


III.  PROBLEMS  OF  EDITING  KOREAN  AND  CHINESE  SCRIPTS 


A.  COBPENT  EDITING  TECHNOLOGY 

The  current  word  processing  practice  in  Korea  is  to  type 
Korean  characters  by  the  enumeration  method,  that  is,  input 
letters  (8  bit  code:  consonant  and  vowel  in  sequence)  and 
output  these  letters  as  a  character  syllable  using  a  Korean 
character  conversion  program  for  Korean  script.  Appendix  3 
shows  the  EBCDIC  input  codes  currently  used  by  FACOM,  and 
Appendix  C  depicts  MBS  (Mahavk  Data  Sciences)  input  codes 
used  by  IBM.  To  type  Chinese  characters  the  following 
sequence  is  followed: 

1.  Depressing  a  Chinese  character  function  key. 

2.  Typing  the  sound  character  of  a  Chinese  character 
using  the  enumeration  method. 

3.  Displaying  all  homonym  (from  1  to  60)  characters 
[Ref.  4:  p.  34]  that  have  the  same  sound. 

4.  Selecting  one  character  by  using  an  index  number,  and 
entering  the  character  to  a  buffer  or  file. 

Machines  dealing  with  Korean  language  data  are  currently 
available  from  the  IBM  and  FACOM  corporations  in  Korea; 
IEM’s  Multistation  5550  (1984)  and  FACOM  OS  IV(KEF)  (1982) 
are  newly  updated  and  well  developed  machines.  These 
machines  still  have  several  disadvantages  in  handling  Korean 
and  Chinese  characters: 

1.  A  large  amount  of  time  is  spent  in  character  conver¬ 
sion. 

2.  It  is  difficult  to  directly  delete  and  insert  records 
in  a  file. 

The  word  processing  editor  cannot  recognize  the  char¬ 
acters  being  edited  before  executing  a  character 
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conversion  program  since  only  the  enumerated  letters 
can  te  displayed. 

4.  The  method  of  entering  characters  is  inconvenient  and 
requires  a  tremendous  amount  or  effort  for  Chinese 
characters. 

5.  One  cannot  convert  all  Korean  character  syllables 
into  Chinese  characters  because  there  is  not  a  ore  to 
one  mapping. 

6.  Data  communication  is  impossible  since  there  are  no 
standard  codes  for  Korean  and  Chinese  characters. 

Appendices  D  and  E  show  the  keyboard  of  IBM  Multistation 
5550  [Bef.  8:  p.  14]  and  FACCM  OS  IV  (KEF)  [Ref.  7:  p.  48] 
respectively. 

B.  0 SI R  RECDIREMEHTS 

Most  potential  users  have  recognized  that  the  computer 
is  essential  in  data  processing  and  office  automation. 
However,  because  of  the  above  constraints,  they  are  unsatis¬ 
factory  for  use  with  the  Korean  language.  Some  general  user 
requirements  of  computer  researchers  and  manufacturers  are 
the  following: 

1.  Users  want  to  use  Korean  language  commands  and 
programs  but  there  are  no  Korean  language  oriented 
operating  systems  or  programming  languages  such  as 
COECL,  FORTRAN,  Pascal,  etc. 

2.  Users  want  to  edit  three  kinds  of  characters  simulta¬ 
neously  and  in  a  user  friendly  manner. 

3.  Users  want  to  display  and  print  out  data  without 
using  a  conversion  program,  as  is  done  with  the 
Korean  alphabet  because  of  time,  memory  space,  and 
inconvenience . 

4.  Users  want  to  use  interactive  files  and  database 
processing. 


In  summation,  they  want  to  use  computers  that  handle  three 
kinds  of  script  in  the  same  manner  in  which  present 
computers  do  with  the  Homan  alphabet. 

C.  PROBLEMS  OF  REPRESENTATION  OF  THE  THREE  KINDS  OF  SCRIPTS 

Because  of  the  characteristics  of  Korean  and  Chinese 
characters,  the  following  problems  occur: 

1.  Hew  can  one  enter  2,400  Korean  characters  and  1,800 
Chinese  characters  into  a  computer  through  a  limited 
number  of  keystrokes. 

2.  How  can  one  develop  the  system  program  to  direct 
input  and  output  without  using  a  conversion  program. 

3.  How  can  the  asthetic  quality  of  display  and  output  be 
improved. 

4.  Hew  can  one  increase  the  processing  speed  and  reduce 
the  memory  space  for  these  character  definitions. 

There  are  other  problems  but'  the  above  problems  are  the 
most  significant.  Amcng  these  problems  the  first  one  is  the 
most  serious  and  significant  problem,  and  consequently,  the 
authors  will  give  it  more  attention  in  this  study. 


IV.  POSSIBLE  METHODS  FOE  KOREAN  LANGUAGE  DATA  PEOCESSIKG 

In  order  to  solve  the  problems  which  were  mentioned  in 
the  previous  chapter,  the  following  methods  are  offered  as 
possible  alternatives  for  Korean  language  data  processing. 


A.  8-BIT  CODE  FOR  K0EE1N  ALPHABET 

Since  the  Korean  alphabet  consists  of  only  24  letters 
and  Korean  language  data  can  be  expressed  using  only  Korean 
characters  without  a  serious  problem.  The  enumeration 
method,  like  the  Reman  alphabet,  is  the  easiest  way  to 
represent  Korean  characters  without  changing  the  hardware 
and  the  operating  system.  This  method  is  not  highly  readable 
and  would  require  changes  in  the  language  which  may  not  be 
acceptable  to  users. 

1  *  Using  the  Current  S tanda rd  Keyboard 

A  program  can  be  loaded  which  defines  the  24  letter 
Korean  alphabet  to  a  character  generator  instead  of  the 
lower  case  Roman  alphabet.  All  Korean  alphabet  elements  and 
the  upper  case  Roman  alphabet  characters  are  then  available 
through  the  standard  Roman  character  keyboard.  Pith  this 
method  the  user  can  use  a  computer  in  a  similar  manner  as 
the  users  who  use  the  Roman  alphabet.  In  addition,  well 
developed  hardware  and  software  can  be  used  without  critical 
problems.  This  method  has  been  suggested  by  many  groups  of 
people  from  the  time  when  the  Korean  typewriter  was  first 
developed.  The  only  disadvantage  is  the  breaking  of  tradi¬ 
tional  custom.  To  capitalize  on  developed  technology  and  for 
the  ease  of  application,  more  study  and  research  should 
center  on  user  acceptability  of  the  enumeration  method. 


Figure  4.1  shows  an  example  of  hard  cojy  which  uses  a 
graphic  dot  printer  and  a  standard  keyboard.  Appendix  F 
shows  the  load  command  program  for  an  alternative  character 


id  sto  0_i  uio 

u-  o.i  Ata  ci.  it  sto  0-L  iua  at  l_l  Atu  nia  ora  i_a  L,,a  i_a 
jjxa  ch  it  12  At  ot  aiL  cio  li  i_  in  oa  At  Ata  c-a.  ow  >  i.  aia 
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Figure  4.1  Example  Using  Standard  Keyboard. 

generator  for  the  Korean  alphabet.  This  program  can  be 
generated  easily  by  the  alternative  character  set  editor, 
and  it  loads  the  Korean  alphabet  to  an  alternative  character 
generator  instead  of  the  lower  case  of  Roman  alphabet. 

2.  Using  the  Capital  LeJLlSIs  as  the  Initial  Letter 

The  major  difficulty  with  the  enumeration  method  is 
poor  readability.  Korean  users  read  a  sentence  sequentially 
syllable  by  syllable.  In  order  to  increase  readability ,  the 
initial  letter  of  each  character  can  be  written  as  an  upper 
case  letter  to  distinguish  the  syllable  tasily.  Figure  4.2 
shows  the  example  using  the  capital  letters  and  Appendix  ' 
represents  the  load  command  program  for  these  letters.  A 
special  mark  or  altered  sha^e  of  each  letter  also  can  be 
applied  to  increase  a  readability  when  an  enumeration  method 
is  used. 


Ii^oOj  dia 

Li-Oj  4aCHu  "Lo>oO-r  lLxiiL-i.  /(h_"1i2  (ha"LA  dial  /, 

DiACuTrta  l  Loll  T-  liiiQiAj  LaC-tL  !L4>  1 2\u 

0-uliD-t 

Figure  4.2  Example  Using  Capital  Letters. 

B.  16-BIT  CODE  FOB  THE  THREE  KINDS  OF  SCRIPT 

There  are  various  methods  one  can  use  to  enter  Korean 
and  Chinese  characters,  but.  the  16-Lit  code  is  one  of  the 
tetter  methods,  since  it  can  identify  all  possible  Korean 
and  Chinese  characters  without  using  the  er.uoer  at  ior.  method 
and  a  conversion  program.  The  structure  of  this  code  will  be 
discussed  briefly  in  the  following  subsection. 

1.  Jb-bit  Code  fee  Korean  Script, 

As  mentioned  before,  a  Korean  character  syllable 
consists  of  three  parts: 

1.  First  sound;  cne  of  19  simple  or  double  consonants. 

2.  Second  sound;  cne  of  21  simple  or  compound  vowels. 

3.  Third  sound;  cne  of  28  simple  or  compound  consonants 
(optional)  . 

Since  the  number  of  each  first,  second,  and  third 
letters  is  less  than  22  letters,  5  bits  are  enough  to  iK-n- 
tify  each  sound.  All  possible  Korean  characters  can  be  iden¬ 
tified  using  15  bits.  The  1st  bit  of  It.  bits  is  usei  to 
indicate  a  Korean  character  (by  a  0).  The  next  5  hits  are 
used  for  the  first  sound,  the  following  5  1  its  for  the 
second  sound,  and  the  final  5  bits  for  the  third  sound. 
Table  III  shows  the  structure  of  lb-bit  cole  for  the  Korean 
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character  and  Table  TV  explains  the  16-bit  code  for  Korean 
character.  This  code  table  is  basically  the  same  as  the  IB  M 
2-byte  internal  Korean  character  code  [Ref.  6:  p.  52].  The 
only  difference  is  the  arrangement.  Some  IBM  codes  represent 
three  letters.  This  makes  key  tops  (face  of  each  key)  mere 
complex;  for  example,  00100  (one  key  top)  represents  _4.,  H  . 
and  values.  The  code  suggested  in  Table  TV  reduces  seme 
of  this  complexity  by  limiting  the  possible  values  to  no 
more  than  two  for  each  keytop.  In  contrast  to  the  example 
for  I  EM  codes,  the  same  code  from  Table  IV  represents  only 
one  value.  Appendix  H  represents  the  IBM  2-byte  internal 


TABLE  III 

Structure  of  16-bit  Code  for  Korean  Script 
< - 1st  Byte - >|< - 2nd  Byte - > 


10 

1 

III) 

1111 

1111! 
1  1  1  1  ! 

•111  1 

111  1 

< — 1st  sound--] 

: - 2ni  sound--] 

1- 

-3rd  sound--> 

5  bits 

5  bits  | 

5  bits 

*  0: 


Korean  character 


Hangul  code  for  the  Korean  character. 

The  suggested  code  has  several  advantages.  First,  it 
is  easy  to  sort  the  character  order  by  its  value  since  the 
value  of  each  letter  is  in  the  order  of  the  Korean  alphabet. 
Second,  it  can  reduce  the  memory  space  for  data  by  using  2 
bytes  instead  of  3  bytes  for  one  character.  Third,  it  is 
possible  to  edit  the  character  directly  since  it  does  not 
need  code  conversion.  Finally,  since  it  can  easily 


recognize  the  code  value  of  the  Korean  character,  it  helps  a 
programmer  when  it  is  programmed. 
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TABLE  IV 

16-bit  Code  for  Korean  Script 


5  bit 
code 

1st 

sound 

letter 

2nd 

sound 

letter 

3rd 

sound 

letter 

5  bit 
code 

1st 

sound 

letter 

2nd 
sound 
lett  er 

3rd 

sound 

letter 

OOOOC 

10000 

A 

Zt 

00001 

~l 

"7 

10001 

□ 

□ 

000  1C 

11 

t- 

11 

10010 

SL 

ao 

0001  1 

— - 

H 

Ik 

10011 

H 

y 

00100 

L 

- - 

•— 

10100 

ttU 

00101 

LX 

1010  1 

T) 

tiA 

001  1C 

H 

10110 

A 

A 

0011 1 

c 

c. 

1011  1 

Ak 

*1 

AA 

0  1000 

cc 

H 

1 1000 

O 

O 

0  1001 

£ 

2 

11001 

X 

X 

010  10 

-<l 

21 

11010 

XX 

r\ 

0  101 1 

- —  —  - 

2D 

1101  1 

X 

— 

7T 

X 

01100 

=11 

ea 

11100 

=7 

=7 

0  1101 

Zk 

11101 

E 

-1 

EL 

011  10 

— 

— 

A- 

2E 

11110 

JL 

1 

0  1111 

ZJL 

11111 

o 

-±- 

© 

*  Blank:  Not  used 


2 •  Code  for  Chinese  Characters 

There  is  no  limitation  in  the  number  of  usable 
Chinese  characters,  but  statistics  show  that  1,800-?,  000 
characters  cover  98-99.8  percent  of  those  which  appear  in 
newspapers  and  journals  (Table  II).  Currently  there  are 
only  two  ways  to  represent  Chinese  characters  in  Korea.  One 
method  is  comprised  cf  two  steps.  The  first  step  is  to 
display  all  Chinese  characters  (synonym)  which  have  the  same 
sound  after  entering  the  desired  sound,  and  the  second  step 
is  to  enter  the  Chinese  character  which  is  needed  by  the 
user  via  an  index  number  matched  to  that  character  after 
selecting  it  in  the  display.  The  other  method  is  to  convert 
a  Korean  character  to  a  Chinese  character  after  typing  a 
Korean  character  as  a  unit  of  a  word,  which  consists  of  two 
or  three  characters. 

The  former  is  inconvenient  and  takes  a  long  time  to 
edit.  The  latter  has  no  flexibility  in  that  it  is  limited  by 
the  programmed  word  cedes.  To  solve  the  above  problem  and 
simplify  the  identification  of  each  character  using  a 
limited  number  of- keystrokes,  a  16-bit  code  for  Chinese 
characters  can  be  applied.  Table  7  represents  the  structure 
of  16-bit  code  for  Chinese  characters. 

Chinese  characters  represent  both  meaning  and 
phonetics  to  Koreans.  To  simplify  the  code,  all  the 
complete  meaning  and  sound  of  the  Chinese  characters  are  not 
needed.  The  Chinese  characters  are  composed  of  from  one  to 
five  syllables  for  meaning  and  one  character  for  the  sound. 
Simplicity  can  be  achieved  by  employing  abbreviations  or 
acronyms  for  each  part  (meaning  and  sound) .  For  example,  a 
Chinese  character  ( JR_)  has  a  meaning  as  "Kea-Ven"  and  sound 
as  cheon.  In  this  case  we  use  H  of  Hea,  V  of  Ven,  and  C  of 
cheon  as  a  cole  for  (  )  . 


TABLE  7 

Structure  of  16-bit  Code  for  Chinese  Characters 


< - - 1st  Byte - >  < - 2nd  Byte - > 


1 1 

1 

1  !  1 

l  l  1 

1  1 
1  | 

1111 

1  1  1  1 

1 

1 

1  1  1 

1  1  ! 

< — 1st  letter-] 
of 

1st  meaning 
character 

: — 1st  letter — ] 
of 

2nd  meaning 
character 

i 

[--1st  letter--> 
of 

3rd  meaning 
characte  r 

*  1:  Chinese  character 


But  this  method  may  result  in  duplicate  codes  for 
different  Chinese  characters  which  mean  another  character 
and  may  have  the  same  value  as  HVC.  In  order  to  eliminate 
the  duplicate  code  and  to  use  the  3  letter  code  which  is 
compatible  with  the  16-bit  Korean  character  code,  the 
following  characteristics  of  the  sound  and  meaning  of 
Chinese  characters  are  relevant:  First,  only  428  syllables 
are  used  to  represent  the  sounds  for  all  Chinese  characters. 
That  is,  one  sound  can  represent  1  to  60  Chinese  characters. 
Second,  the  frequency  of  Korean  characters  used  for  the 
meaning  and  sound  is  irregular  in  distribution.  More 
specific,  20%  of  Korean  characters  are  used  to  represent  the 
sound  and  meaning  of  95%  of  Chinese  characters  [Ref.  11]. 

As  a  result  of  analyzing  the  1,300  sound  characters 
and  1,438  meaning  characters  used  to  represent  the  Chinese 
characters.  Table  VI  and  Table  VIII  are  derived.  Table  VI 
represents  the  number  of  Chinese  characters  which  have  the 
same  first  sound  letter  and  the  same  second  sound  letter. 

For  example,  266  Chinese  characters  have  the  first  sound 
letter  {  ~l  )  ,  44  Chinese  characters  have  (  1  )  as  a  first 
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TABLE  VI 

Characters  Having  Same  1st  S  2nd  Letter  Sound 


sounc  letter  and  ( _ )  as  a  2nd  sound  letter.  One  must  rear¬ 
range  the  sound  acronym  to  the  5  bit  code  since  the  distri¬ 
bution  of  sound  characters  is  irregular.  Table  VII  depicts 
the  rearranged  code  value  for  acronym  of  the  Chinese  sound 
character. 


[•Xh 


represents  ^  ,  _H_,  ,  _H_,  and  (  ~i  )  ;  _d_»  -*l  ,  =1  , 

_1L»  an  3  (_^-_)  ;  _-L-_ ,  ,  _-Ul_ ,  J-H_,  _^l_,  etc.  Also  (  H»  ) 

represents  _H_,  jd__  group  which  assembles  the  first  conso¬ 
nants  at  the  left  of  the  vowel.  (_?_)  describes  _±_,  _T_ , 

and  group  which  assembles  the  first  consonants  above  the 

vowel . 

Since  the  freguencies  of  Korean  character  syllables 
representing  Chinese  character’s  sound  and  meaning  are 
different,  the  frequency  of  Korean  characters  to  represent 
Chinese  characters  meaning  is  needed  to  be  analyzed.  After 
analyzing  the  sampled  1,438  characters  which  are  the  first 
and  the  second  meaning  characters.  Table  VIII  is  derived 
which  shews  the  number  of  times  for  a  meaning  character  or  a 
group  to  be  used. 

The  meaning  acronym  value  to  a  5-bit  code  from  the 
basis  of  Table  VIII  can  be  reassigned.  Table  IK  shows  the 
reassigned  5-bit  codes  representing  the  acronym  of  the 
meaning  character.  The  same  theory  can  be  applied  as  in 
Table  VII  when  Table  IX  is  derived.  As  the  acronym  code  is 
rearranged,  the  proportion  of  the  duplicate  codes  car.  be 
reduced.  As  a  result  of  applying  these  rearranged  codes. 
Table  X  car.  be  produced  which  shows  the  proportion  of  the 
duplicate  codes.  The  pure  acronym  code  (Table  IX)  repre¬ 
sents  the  acronym  of  a  meaning  and  a  sound  character  as  a 
first  letter  code  of  Korean  characters  (  1  ~n  L  c  cc  e.  a 
_d_  dd  _A_  /A  _o_  x*  Jfc  _f!_  JL  i  JL;  19  possible  conso¬ 

nants),  the  arranged  sound  character  acronym  code  (Table 
vil) ,  and  the  arranged  sound  and  meaning  character  acronym 
code  (Table  VIII) . 

The  reasons  why  some  duplicate  codes  cannot  be  elim¬ 
inated  are:  First,  some  Chinese  characters  have  similar 

meaning  and  sound  which  generates  the  same  acronym  code  (22 
among  1,800);  and  second,  there  are  initially  some  Chinese 
characters  which  have  the  same  meaning  sound  (  12  among 
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Number  of 
character 


1  *  90  C 


TABLE  I 

Proportion  of  Duplicate  Code 


pure 

acronym 

code 


7.5? 


Rearran 

ged  | 

Rearranged  sound 

sound  c 

aa-  1 

and  meaning 

racter 

code) 

character  code 

2.  3% 

-  ~  1 

1 

1. 8T 

3 .  16-hit  Code  for  Roman  Alphabet  and  Sym bo  Is 

In  order  to  use  the  three  mixed  kinds  of  a  character 
code,  and  simplify  the  I/O  controller,  and  unify  the  word, 
16-bit  codes  for  the  Roman  alphabet  and  symbols  should  be 
generated  fcy  only  one  keystroke.  For  data  communication  and 
for  familiarity,  adding  only  the  default  byte  (0 P H )  to  ASCII 
code,  16-bit  code  for  Roman  alphabet,  symbols,  and  control 
characters  can  be  defined.  When  one  uses  only  Roman  alphanu¬ 
meric  characters,  one  can  easily  convert  this  16— bit  cede  to 
ASCII  cede.  Table  XI  shows  the  16-bit  code  for  Foman 
alphabet  and  symbols. 

4 *  Keyboard  for  16-bit  Code 

As  it  is  mentioned  in  the  previous  chapter,  the 
biggest  issue  is  how  to  enter  all  Chinese  characters,  Korean 
characters,  and  the  Roman  alphabets  with  a  simple  keyboard. 
In  order  to  implement  the  16-bit  code  to  keyboard,  one  would 
have  to  make  the  keystrokes  which  generates  "1"  or  "O”  as  a 
Chinese  character  function  key  (bit  1),  three  5-tit  codes 
(00000-  1111  1)  for  Chinese  and  Korean  characters  (bits  2-16) 
and  16-bit  code  for  Roman  alphabet  and  symbols.  In  this  case 
33  more  keys  than  the  common  Roman  alphabet  keyboard  are 


TABLE  XII 

Leveled  Letters  on  32  Key  Tops 


Alternative  I 
I-A  I-E 


1 


2  |  3  |  4 
------ 


1 


Alternative  II 
II- A  II-B 

+ 


2  |3  |4 
5 


+ - + 

2  !3  14 


5  |6 

I 


2  13  |4 


Legend:  1 

3 

4 

5 
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Roman  alphabet 

First  letter  of  Korean  character 
Second  letter  of  Korean  character 
Third  letter  of  Korean  character 
Acronym  of  Chinese  sound  character 
Acronym  of  Chinese  meaning  character 


only  Reman  alphabets  in  the  previous  sentence  without  a 
function  key.  Two  syllables  of  "school"  are  formed  from:  In 
Korean,  the  first  syllable  is  " o  "  selected  from  the  posi¬ 
tion  of  first  sound  letter,  from  the  second  sound  posi¬ 
tion,  "JL"  fro®  the  third  position.  The  second  syllable  is 
"_1,  4L,  Default".  Then,  in  Chinese,  press  the  Chinese  char¬ 
acter  function  key  which  generates  "1"  as  the  first  tit.  The 
first  syllable  is  "J}r"  which  is  the  acronym  of  first  meaning 
character  and  "  •fr "  which  is  the  acronym  of  second  meaning 
character  and  which  is  the  acronym  of  sound  character. 
The  second  syllable  is  "th  JL  IL".  After  typing  Chinese 
characters,  user  must  release  the  function  key  to  type 
Korean  characters.  Table  XIII  explains  the  above  example. 

As  a  result  of  the  above  example,  the  computer 
generates  the  following  codes  in  hexadecimal:  0053 (S) , 
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TABLE  XIII 

Typing  Procedures  for  Mixed  Characters 

Tc  type  "school,  JCL,  ^  4S."/  the  following 
procedures  should  be  follwei: 

1.  Type  "school"  by  one  keystroke  for  each  character 
without  a  function  key. 

2.  In  Korean,  to  type  A", 

first,  type  "_S,  J~_,  JT_", 
second,  tyre  "Jl,  ,  Default". 

3.  In  Chinese,  to  type  "ffi  ^5c".  at  first,  press 

the  function  key,  then  type  "^1,  _r_,  yp" , 
u°lt  (Table  IX)  since  for  "§P" , 

the  meaing  is  "JJH  J^"  and  the  sound  is  "^V",  ar.d 
for  the  meaning  is  "_5L  and  the  sound 

is  " J.". 


0063  (c),  0068  (h),  006F(o),  006F(o),  006C  (1)  ,  and  0  (Korean 

character),  11111  (JL),  00010  (_h_),  0000  1  (J_);  that  is  0  111 

1100  0100  0001  (  7C  h1:_0-3_),  0  (Korean  character),  00001  (J0_)  , 
10010  (^),  00000  (Default)  ;  that  is  0000  0  1  10  0  100  C000  (  06 

and  1  (Chinese  character)  ,  01010(Hl),  10100  (_5l)  , 

11110(©i);  that  is  1010  1010  1001  1 1  1 0  (  AA  9E:jji_),  and 

1  (Chinese  character),  11110(0*).  000  10  (IL),  11110(A);  that 
is  1  1  11  1000  0  10  1  1  1  10  (  F8  5E:*£_). 

5 .  Operating  System  for  Input  and  Output 

To  apply  the  suggested  system,  it  is  needed  to  rede¬ 
sign  the  operating  system  for  input  and  output  control. 
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Figure  4.3  shows  the  flowchart  of  input  and  output  control. 
First,  the  input  and  output  controller  has  to  distinguish 
whether  the  Chinese  character  function  key  is  "0"  or  "1".  A 
flag  register  can  be  used  for  Chinese  character  function 
key.  For  example,  if  the  flag  is  "I",  then  "1"  is  loaded  to 
the  first  of  the  16  bit  register,  and  multiple  5-bit  codes 
are  read  until  the  register  is  full.  If  it  is  full,  a  char¬ 
acter  is  displayed  on  the  CRT,  and  the  16-bit  code  is  sent 
to  a  buffer  as  data.  Otherwise  the  flag  is  "0",  and  then 
"0"  is  leaded  into  the  register,  and  5-bit  codes  are  read 
for  a  Korean  character.  A  16-bit  code  is  used  for  a  Roman 
alphabet  character  or  a  symbol  code  until  the  16- bit 
register  is  full.  If  the  16-bit  register  is  full,  the  iden¬ 
tified  character  is  displayed  and  sent  to  a  buffer  as  data. 
If  the  "stop  edit?"  condition  shown  in  Fig  4.3  is  "no",  the 
input  and  output  controller  makes  a  loop  to  read  a  code, 
displays  a  character  and  sends  a  character  code  to  a  buffer. 

This  system  will  make  the  use  of  Korean  language 
commands  and  programs  easier  to  use  than  those  presently 
available.  To  achieve  the  above  goals,  a  compiler  and  inter¬ 
preter,  as  well  as  the  operating  system  will  require  rede¬ 
signing.  This  system  will  require  the  complete  rewriting  of 
ail  software  currently  used.  The  economic  impact  of  this  on 
the  Korean  people  will  be  enormous. 

6 •  B§sign  Considerations  for  Character  Genera ti on 


There  are  twe  shapes  of  characters  used  in  Korea: 
Gothic  (Figure  4.4)  and  Brush  type  (i.e.,  Ming  style:  Figure 
4.5)  [Kef.  7:  p.  34].  To  generate  the  above  shapes  of  char¬ 
acters,  several  methods  of  a  character  generation  can  be 
considered.  To  select  the  best  method  for  Korean  and  Chinese 
characters,  one  can  use  the  following  five  criteria:  speed, 
space,  quality,  flexibilty,  and  cost.  Speed  is  a  double 
standard:  speed  of  creation  may  range  from  a  few  minutes  to 


Figure  4.5  An  Example  of  Brush  Type. 


a  few  hours,  while  speed  of  production  should  go  teycr.  3  1  JO') 
characters/sec  depending  on  type  size  and  device  resolution. 
Space  refers  to  the  average  size  of  the  code  for  one  char¬ 
acter  as  well  as  the  size  of  the  internal  buffers  often 
needed  for  decoding.  Quality  is  proportional  to  the  largest 
dot  matrix  which  can  lie-  used  to  decode-  a  character;  it 
should  net  he  confused  with  the  resol  it  ion  or  the  output 
device.  For  a  given  type  size,  the  resolution  sets  the  defi¬ 
nition,  that  is  the  size  of  the  matrix  to  te  used;  defini¬ 
tion,  hence  type  size,  is  bounded  by  the  Quality  cf  the 
code.  Flexihilty  refers  to  the  dirrerent  automatic  modirica- 
tions  which  are  supported  by  the  code:  scaling,  rotating, 
family  variations  (as  going  irom  light  to  boll).  Cost  is 


Obviously  tne  five  criteria  above  art-  r.ot  indepen¬ 
dent.  Figure  4.6  shews  the  i  nter  rela  1 1  ot.sh  i  p  of  criteria  in 
designing  a  character  generator  [Ref.  12:  p.  241].  The  most 
desirable  feature  is  indicated  by  the  direction  of  the 
arrow.  Solid  (resp.  dotted)  lines  indicate  agreement  (resp. 


Figure  4.6  Criteria  in  Designing  a  Character  Generator. 


contrariety)  between  the  variation  of  the  factors.  The 
design  of  a  digital  character  generator  is  an  engineer’s 
task  whose  goal  is  to  strike  the  appropriate  balance  between 
the  specifications  for  those  five  criteria  combined  with  the 
characteristics  of  the  production  device,  resolution  and 
scanning,  and  the  necessity  of  operating  the  cor  r  espcr.d  i  ng 
creation  station. 

Table  XIV  [Ref.  12:  p.  268]  gives  a  summary  of  the 
main  characteristics  of  the  coding  methods  that  tie  engineer 
can  utilize  [Ref.  12:  p.  269  ].  As  the-  char  act  erist  ics  of 
Korean  and  Chinese  characters  are  compared  to  Table  XIV,  it 
should  he  apparent  that  the  bit  map  methol  is  the  most 


TABLE  XIV 

Comparative  Table  for  Performance 


Buffer  Peso-  I 

Code  space  space  Flex-  Video  lution] 
Method)  (bits)  (bits)  bility  scan  (n)  |Q 


k*n*log  n 
length]  2 


Chain-] 


Oiffe 

renti-l  6*  k  *  n  + 
b  ( lo  g  n+c) 


k*log  n 


3* rue- I  k'*log  n 
2 


<100 


k  *log  nj 


The  numer  of  birth  point  in  the  character 
A  constant  taking  care  of  bookkeeping 
The  size  of  matrix 

The  average  number  of  runs  per  matrix 
line  cr  column  (a  number  of  the  simplicity 
of  the  character  shapes:  approximately  4 
for  Ecman  body-text  fonts,  higher  than 
10  for  Chinese  characters 


appropriate  one  for  this  application.  It  reduces  code  space 
and  buffer  space.  It  has  good  video  scan,  high  speed,  and 
highly  readable  low  quality  printing.  Unfortunately  this 
method  lacks  flexibility.  However  for  all  the  other  afore¬ 
mentioned  reasons,  the  bit  map  method  is  commonly  used  for 
Korean  and  Chinese  characters. 
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Because  of  the  complexity  of  Korean  and  Chinese 
characters,  at  least  a  16  by  16  resolution  is  required  for 
Korean  characters,  and  a  24  by  24  resolution  for  Chinese 
characters.  32  by  32,  64  by  64,  80  by  80,  96  by  96,  and  123 
by  128  resolutions  are  desirable  when  much  more  fceau+y  is 
required  and  also  when  larger  character  sizes  are  to  be 
produced.  However,  if  these  characters  are  displayed  on  a 
CRT,  with  32  by  32  resolution,  with  the  size  of  each  char¬ 
acter  7-10  mm  square,  this  should  be  sufficient. 

It  is  the  authors*  opinion  that  the  less  expensive 
32  by  32  resolution  CRT  should  be  used  for  softcopy.  The 
reason  fcr  this  is  that  the  price  of  the  memory  component 
required  to  hold  the  character  definitions  is  continually 
getting  less  expensive.  However  stronger  motivation  is  that 
high-speed  and  flexibility  of  typing  is  then  possible 
[Ref.  1:  p.  828].  IBM  corporation  uses  16  by  16  resolution 
for  Gothic  and  24  by  24  for  Brush  type  Korean  character 
syllables  [Ref.  8:  p.  2].  F  ACOM  corporation  uses  39  by  30 
dots  for  Korean  and  Chinese  characters  and  24  by  30  dots  for 
Roman  alphabet,  symbols,  and  Korean  alphabet  (letters)  on  a 
laser  printer  [Ref.  7:  p.  47].  As  cheaper  dot  matrix  and 
laser  printers  find  their  way  into  the  marketplace,  the 
quality  of  characters  will  become  less  of  an  issue. 
Presently  there  are  few  problems  with  the  quality  of  repre¬ 
senting  Korean  characters  that  cannot  be  solved  through  the 
additional  expenditure  of  money.  For  the  definition  of  each 
character,  the  authors  have  presented  two  alternatives; 
software  and  hardware  (character  generator) .  In  order  to 
increase  speed  and  usability,  a  hardware-oriented  character 
generator  is  best.  If  cost  and  flexibility  are  the 
criterion,  software-oriented  character  definition  programs 
are  tetter. 
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memory  Space  for  Character  Definition 

To  represent  Korean  and  Chinese  characters,  one 
needs  to  code  5,000  characters  for  a  character  definition: 
(2,400  Korean  characters;  1,800  Chinese  characters;  800  user 
definatle  characters,  Roman  alphabet,  or  symbols).  If  one 
uses  a  24  tit  by  24  bit  matrix  font  size  for  each  character, 
at  l^east  360  K  bytes  are  required  (3  byte  *  24  *  5,0C0)  for 
character  definition  and  128  K  bytes  are  required  (64  K  for 
16  bit  address  *  2  byte  for  an  address  of  character  defini¬ 
tion  memory)  for  a  look-up  table.  The  total  memory  require¬ 
ment  is  488  K  bytes. 

A  large  memory  space  is  required  for  the  definition 
of  the  characters.  Data  compression  of  these  characters  can 
be  considered  for  two  different  purposes:  data  transmission, 
and  computer  storage  and  output.  Here  one  is  mainly  inter¬ 
ested  in  the  latter  case,  where  the  main  point  is  the  total 
data  amount  to  be  stored.  The  method  of  data  compression  of 
Chinese  characters  can  be  classified  by  using  the  method¬ 
ology  listed  in  Table  XV  [Ref.  1:  p.  820]. 

There  is  a  problem  associated  with  the  enlargement 
and  alignment  of  character  patterns.  The  clarity  of  a  char¬ 
acter  depends  on  the  size  of  the  reproduction.  If  a  large 
size  is  required  the  resolution  must  be  high.  Otherwise, 
stepwise  zigzags  appear  which  to  some  people  are  unbearable. 
Therefore,  all  the  patterns  of  different  font  sizes  must  be 
stored.  This  is  uneconomical.  Reproducing  different  char¬ 
acter  sizes  from  the  same  data  is  desired.  However,  the 
enlargement  and  shrinking  of  character  patterns  from  a 
single  set  of  data  is  quite  difficult,  because,  if  the  addi¬ 
tion  or  the  deletion  of  a  bit  by  the  interpolation  is  not 
done  properly,  it  has  a  negative  influence  on  the  asthetics. 
In  enlargement,  the  smoothness  of  an  edge  is  particularly 
important,  while  in  shrinking  the  gap  between  strokes  must 
be  carefully  maintained. 


TABLE  XV 

Varieties  cf  Data  Compression  Methods 


transmission 


jpage  unit 
Character 


unit 


run-length  coding 

two-dimensional 
predictive  coding 


coding  by  scan  line 
pattern  unit 


dot  pattern  coding  by  m  by  n  block 

representation  /  pattern  unit 


stroke 

r epresentat ion 


checker  board  sampling 


memory 

S 

reconstruction 


ccntour  coor¬ 
dinate  coding 

ccntour  fol¬ 
lowing  coding 


hexagonal  board  sampling 


mathematical  equation 
for  strokes 


enlarge/shrink 


i  synthesis  from  partial  character 
v  patterns 


A  comparative  review  of  the  options  contained  in 
Table  XV  with  regard  to  determining  memory  size  is  very 
difficult.  This  is  because  the  requirements  for  character 
print  qualities  are  quite  different  depending  on  each 
method.  Simplicity  in  the  hardware  and  software  implementa¬ 
tion  of  the  compression  and  reconstruction  of  characters  is 
a  very  important  consideration.  Generally  speaking  high 
data  compression  methods  need  complex  hardware  and  longer 
times  for  reconstruction.  Therefore,  the  tradeoff  to  be 
considered  is  between  the  data  compression  ratio  and  the 
memory  size.  This  represents  the  classic  economic  tradeoff 
between  the  hardware/software  cost  with  regard  to  the  speed 
of  character  regeneration. 


Because  the  price  of  the  memory  component  is 
becoming  less  expensive  the  high-speed  simple  reconstruction 
method  is  preferred  despite  the  necessarily  large  size 
memory.  Many  commercial  machines  have  adopted  this  concept, 
and  store  the  character  dot  patterns  as  they  are  without  any 
data  compression.  For  example,  IBM  machines  use  only  a  12  by 
24  font  size  for  simple  letters  (Roman  and  Korean  alphabet 
and  symbols)  instead  of  a  24  by  24  font  [Ref.  8:  p.  17], 
The  FACOM  machines  use  the  software  definition  of  the  second 
level  of  Korean  and  Chinese  characters  which  are  not  used 
frequently  for  data  compression  [Ref.  7:  p.  12].  Because  of 
the  reduction  in  the  price  of  memory,  the  marketplace  has 
shifted  towards  providing  direct  character  storage,  i.e.  a 
large  memory,  instead  of  utilizing  data  compression. 
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7.  EVALUATION  OF  SUGGESTED  METHODS 

The  principal  problems  in  current  editing  technology  for 
Chinese  and  Korean  characters  were  detailed  in  Chapter  III. 
Fundamentally,  the  problems  cause  user  inconvenience, 
require  lengthy  input  procedures,  and  result  in  complex 
update  requirements.  The  authors’  suggested  methods  will 
solve  most  problems  which  are  encountered  in  Korean  language 
data  processing.  More  research  and  development  remain  in  the 
following  areas: 

First,  in  an  enumeration  method,  there  is  nc  problem 
except  low  readability  to  Koreans  and  the  inabilitv  to 
represent  Chinese  characters.  In  this  case,  Chinese  charac¬ 
ters  are  ignored  because  Korean  language  data  can  be  repre¬ 
sented  through  the  use  of  only  Korean  characters  without 
sericus  problems.  Low  readability  is  caused  by  unfamiliarity 
and  the  unbalanced  shape  of  each  letter  when  written  by  an 
enumeration  method.  With  a  minor  change  of  shape  of  the 
letters,  this  method  will  eliminate  the  above  problems. 

Second,  the  16-bit  code  for  the  three  kinds  of  charac¬ 
ters  requires  the  consideration  of  the  following  problems: 

1.  The  32  key  tops  are  complex  since  each  key  top  repre¬ 
sents  three  or  four  letters  and  acronyms.  One  solu¬ 
tion  to  this  problem  is  to  use  lighted,  changeable 
key  tops  which  represent  only  one  letter  or  acronym 
at  a  moment  according  to  the  function  keys  and  the 
order  of  kevstrokes(  i. e. ,  1st,  2nd,  3rd  letter  and 
acronym  ) . 

2.  The  user  must  remember  whether  a  letter  to  be  typed 
is  the  first,  second,  or  third  letter,  and  whether  it 
is  an  acronym  of  a  sound  or  a  meaning  character. 
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3.  If  a  user  does  not  know  the  meaning  of  a  certain 
Chinese  character  to  te  typed,  one  must  look  up  a 
table  which  shows  all  meanings  and  sounds  of  all 
possible  Chinese  characters. 

4.  In  typing  Korean  characters  which  consist  cf  only 
first  and  second  letters,  a  user  has  to  hit  the 
default  key  to  make  a  16-bit  code.  Instead  of  second 
letter  and  default  keys,  one  can  use  twenty  one  more 
second  letter  keys  which  generate  10-bit  code  as  the 
second  and  third  letters.  Unfortunately,  this  will 
make  the  keyboard  more  complex. 

5.  Regardless  of  the  authors’  analysis  of  sound  and 
meaning  characters  and  careful  rearrangement  of  these 
codes,  duplicate  codes  still  exist.  This  is  because 
of  the  irregularities  caused  by  a  natural  evolution 
cf  sound  and  meaning  characters  for  over  2,000  years. 
Generally  3,000  or  more  Chinese  characters  will  cause 
duplicate  codes  to  increase  proportionally.  In  order 
to  eliminate  the  duplicate  codes,  the  Korean  language 
committee  needs  to  take  measures  to  clarify  the  mean¬ 
ings  of  the  Chinese  characters  that  cause  duplicate 
codes  to  exist. 

Before  the  actual  construction  of  the  suggested  system, 
an  economic  (Cost/Benefit)  analysis  needs  to  be  considered. 
Given  the  r  %  discount  rate  and  the  various  yearly  costs  and 
benefits  estimated  by  past  data.  Table  XVI  [Ref.  14]  shows 
the  following  formula  which  can  be  used  to  derive  the  net 
present  value  of  this  project:  This  simply  states  that  the 
net  present  value  (NEV)  is  equal  to  the  sum  of  the  differ¬ 
ences  between  benefits  (B)  and  costs  (C)  in  each  year  (i)  of 
the  project  life  (T)  ,  divided  by  the  relevant  factor  (r)  for 
that  year.  The  current  estimate  of  the  market  size  for  word 
processing  in  Korea  is  $  2.5  million  annually  (Korean  Daily 

Times,  Sep  10  1934).  But  this  estimate  will  be  in  inverse 
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TABLE  XVI 

Het  Present  Value  Formula 


NPV 


T 

=  £ 

i=1 


(3.  -  C.) 
1  *1 


(c) 
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Legend : 


NPV:  Net  Present  Value 
B:  Benefit 
C:  Cost 
i:  Each  year 
T:  Project  life 
r:  Pelavent  factor 


proportion  to  the  price  of  the  system  and  will  be  in  direct 
proportion  to  the  usefulness  and  the  user-friendliness  until 
maturation. 

In  the  above  formula,  the  market  price  of  the  system 
influences  the  benefits  for  manufacturers  and  ccsts  for 
users.  This  system  is  feasible  when  the  net  present  values 
are  positive  for  both  manufacturers  and  users.  if  the 
benefits  for  manufacturers  and  the  costs  for  users  are 
constant  in  a  system,  the  main  problems  will  be: 

1.  How  to  minimize  the  costs  for  manufacturers 

2.  How  to  maximize  the  benefits  for  users 

To  solve  the  above  problems,  the  best  approach  will  be  to 
make  an  efficient  and  user  friendly  system  for  Korean 
language  data  processing.  This  will  increase  the  number  (N) 
of  systems  sold,  and  increase  the  individual  productivity  of 
the  users. 

There  are  many  factors  and  constraints  which  cause  high 
cost  in  implementing  this  method.  Among  these,  the  following 


three  factors  affect  the  cost  performance  ratio  for  both 
manufacturers  and  users: 

1.  The  initial  design  cost:  For  this  system,  an  organi¬ 
zation  has  to  invest  initially  for  the  design  of 
about  5,000  Korean  and  Chinese  character  patterns, 
and  the  system  software  and  hardware.  As  the  number 
of  systems  produced  by  a  manufacturer  is  increased, 
the  unit  cost  of  each  system  will  be  decreased  as  the 
costs  are  spread  over  more  units. 

2.  Cost  for  character  generator:  As  mentioned  in  Chapter 
IV,  one  needs  about  500  K  bytes  memory  capacity  for 
these  character  definitions.  The  cost  of  memory  is 
decreasing  and  the  speed  is  increasing  as  technology 
is  being  developed.  This  cost  is  an  initial  cost  to 
users  when  buying  a  system. 

3.  Cost  for  hardcopy:  One  can  consider  three  kinds  of 

printer  for  hardcopy:  dot  matrix  printer,  chain 

printer,  and  laser  printer.  It  is  not  practical  to 
use  a  chain  printer  for  our  system  since  the  chain 
will  be  approximately  twenty  meters  long  (  5,000 

character  *  4  mm  per  each  character)  and  it  would  be 
prohibitively  slow.  Currently,  dot  matrix  printers 
and  laser  printers  cost  more  than  chain  printers,  tut 
they  are  the  only  viable  option. 

Among  the  three  kinds  of  cost,  the  third  one  is  the  most 
serious  since  the  cost  of  hardcopy  is  increasing  as  its  use 
increases.  Recently  laser  printers  have  become  more  popular 
for  these  characters  because  of  the  good  quality,  high  speed 
and  decreasing  price.  Comparatively  though,  laser  printers 
are  still  relatively  expensive. 
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VI.  RECOMMENDATION  AND  CONCLDSIOS 


As  the  demand  for  data  processing  in  Korea  increases, 
users  will  continue  to  encounter  more  and  more  problems  in 
|  utilizing  the  Korean  language  for  data  processing.  The 

current  methods  of  convergence,  display  and  select  to  imple¬ 
ment  the  Korean  language  in  data  processing  must  only  be 
considered  as  interim  measures  due  to  their  inefficient  and 
time  consuming  means  cf  data  entry.  In  order  to  prevent  this 
problem  from  becoming  more  complicated  due  to  the  develop¬ 
ment  of  various  new  implementations  forwarded  by  independent 
research,  a  standardized  system  must  be  developed, 
i  This  study  examined  two  possible  solutions  for  using  the 

Korean  language  in  data  processing.  The  enumeration  method 
is  technologically  feasible,  inexpensive,  easy  to  implement, 
but  could  not  be  used  for  applications  within  the  Korean 
data  processing  environment.  This  is  because  it  results  in  a 
textual  form  of  Hangul  that  is  unfamiliar  to  most  Korean 
people.  Therefore  the  current  enumeration  method  is  not  a 
feasible  solution  to  the  Korean  data  processing  problem. 

The  second  method  examined  was  based  on  a  16-bit  code 
representation  of  Korean,  Chinese  characters,  and  the  Roman 
alphabet.  This  method  was  found  to  possess  all  the  advan¬ 
tages  currently  realized  by  the  EBCDIC  or  ASCII  code  repre¬ 
sentation  of  western  countries.  The  only  drawback  to  this 
system  is  that  it  might  not  be  cost  effective  based  on 
current  technology.  However,  due  to  the  rapid  development  of 
hardware  and  software  technology,  a  cost  effective  means 
should  be  available  within  the  next  few  years. 

In  order  to  accelerate  the  determination  of  a  thorough 
broad  based  solution  to  the  Korean  data  processing  problem, 
the  Korean  government  must  organize  and  charge  a  national 


level  committee  with  the  responsibility  for  investigating 
the  problem  and  determining  a  viable  solution.  This  study 
with  its  proposal  cf  a  16-bit  character  cole  should  be 
provided  to  that  committee  for  further  examination.  '’'his 
proposal  represents  a  concept  that  could  eventually  lead  to 
a  long  term  viable  sclution  to  the  data  entry  and  processing 
problems  of  using  the  Korean  language. 
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THE  EVOLUTION  OF  CHINESE  CHAR  ACT  EES 
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FACCH  OS  IV  (KEF)  KEYBOARD 
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APPEN2H  l 

LOAD  COMMAND  PROGRAM  FOR  CURRENT  KEYEOA  R3D 


1 ( a"KC"  ) 

l-  "oo; 

l"A"0008142241 7F4141 
1"D".007C222121  21  22  7C 
l"E"007F40407C40407F 
1«F"007F40407C404040 
’"G" 001E2140404721  IE 
i“H"00414141 7F414141 
1"I"003E0808080808 36 
1"3m003E4141414141 3E 
1"PM  007E41417E404040 
I "  Q"  0  C  3  E  4  1  4  1  4 1  4  5  4  2  3  0 
1"P"00  7E41417E<.44241 
1"S"003E41403E3*413£ 
1-T-007F080838380808 
1«U"0041  4141  41  41  4  1  3  £ 

1  "  W  "  0  0  4  1  41  41  49495522 
1-Y"004122 1408380808 
l"a" 00007F41414141 7F 
l"d" 00001C224141221C 

1**3M00007F404040407F 
1"*"00CC7F0101  7  F  <.  0  7F 
1"'7-000008  7F00  3E41  3E 
l"4»0000060808a8387F 
l"i"0000203E20203E20 
1  "  o  -  0  C  0  0  4  2  4  2  4  2  7  t  -  2  4  2 
1  "d"0000090909  790909 
1,,3,,000041  h17F«.141  7  F 
1" r"0G0G7F010101ClQ2 
1  "s"0C00>.040404040  7F 
l“t"3000080808142241 
l"u" 000001  1FC1 01 1 F  0 1 
l"u»"00007F080814224l 
1"/"000C22222222227F 
1  "  !  " 0008080908080008 
1"1"3G0318290906083E 
1  "  <  "  0  0 0  106186018  0  6 01 
1 -2-0C7E21 2 l 3E  2 121 7E 
l"C"00iE2  1  404G402  1  IE 
1" J"OO0EO404G4044433 
1"K«00434C  5860  584C43 
1  "L"0C4G4G4  0  40  4  04C  7F 
1-M"0041635549414l4l 
1"N" 0041615149454341 
1"V"0041412222141C08 


1-1-0000080808080808: 
l-m"000000000000007F; 
l-n-00007F0808080808; 
l«v»00007F22222222  7F  ; 
1-K-00007F007F40407F  : 
1-2-00007F011F010204; 

i-,"oorooooooo  302040; 
1-. "00000000  00  1  8  1  8  : 
1">"00201836Q1 06182c: 
1*" '000022  14  7C  1422  : 
1"*" 0000494977494977; 
1-:"000C0o087F0308: 
1-;*'  0000097909397909: 
1"CM00007F2222225549; 
1"3"0C00 7744444444 77: 
1"(,,00006020  1  0  : 
i")"OOOE10107010100E  : 
l-«"0C24247E247E2424; 
1"$-00093E4B3E093E08; 
1"?*'  0051^2  3  4081C2343; 
i"eM00060408: 

1"( "0004320101010204; 
1")"OOOC0OOOO03C0CcF; 
1"*"301C20h04040201C; 
1-*"0018130C001818; 

1 "-"  003000  7 f  ; 
1"/"OOOC222236494949: 
1 -0"0CIC2245495  1  22  1C  ; 
1 -2"0  0  3C42  0  1  0E  3040  7F  : 
1"3"007F0204CE01413E; 
1  -4-00040C  1  4  24  7F0404  ; 
l"5»0G7F4C7E4i:i413c: 
1"6m001E21407£61211E: 
1-7" GO  7F Cl  0  2  04  36  1C2C  : 
1  -  3-00  3E4141 3E4141  3E  I 
1-9 "00  3C424  33C0142  3C  : 
1  *  ="003C  OC  JOOOOC  33  1  c : 
l"?"003c41 3608093038; 
1 -3"0C242424  ; 

1 -\-0000  77  1  1  1  1  1  1  11  22  : 
l "*"00  3344*4  364  546  39  : 
1"_"0000007F007F: 
1-'"004141FF41495522: 
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APPENDIX  G 

LOAD  COMMAND  PECGRAM  FOR  CAPITAL  LETTER  KEYBOARD 


1 ( a"  les"  ) 

i"  -oo; 

l“!"0000007F01010101020C; 
1"*"000000007F0314224141  i 
1"*"000000082E483E093E03: 
1"*"0OOOOOOOOO7FOC22227F; 
1"*"00000000000018I3: 
1,,,M00  0  00039390S790905,39; 
1"."300000050525277D050S: 
1  "  /  " 00000009397909790)39; 
1"0"OC0000057D1525504545: 
l"i“OOFF01010131010101Ci; 
1"2"007F414141414141417p; 
1"3” 007F0808142241414141  ; 
1m4«007F0C2424242424?47F; 
1“5"0C494949497F4949*.9  7F; 
1m6"0000004040407E40404C; 
i"?HOOOGOGiOG00  98Q8J8Q37F; 
1"3M00000000300C00307F: 
1"9" 00000001010101030579; 
1"J *00000001701121274141; 
1"2"OOOGOGOC7C414141417F  ; 
l"a"000000007F4040404C7‘-; 
I" 3" 0000007515157745,577; 
1"C"000000007F307F404C7F; 
l"0"30000CQ07F31300rG2CC; 
1" E" 0000001C007F03142241 ; 
1"F” 00000077093909112244; 
1"G" 00 000077  1  5  1  5  754  54  5  77  ; 
1"H« 00000000771111714172; 
1"3"0000000040404040407F; 
1"PW3000C01CJ07F003E'»13E  ; 
1HSM 000000000408142241  41  ; 
1  "T« 00 00 00 00  12  12  32 4C 4  94  V  ; 
l«v"00 30 0000721212254949; 
1"W*'OC00000041,17F41417p; 
1-x”QO0O000C1C224141221C; 
l-T "00000000771015754077; 
1m2"000000007FJ1317F407F; 
1  "A"OC0000064GtF"»C464  9  7f,  ; 
l"a"307F404040*04040407F  ; 
l"c"007F00007F,040404C7F; 
I  "  d  "  Q  Q  7F313101 3F0101C10  1  ; 
1M<»"001C007F33142  2  414141; 


l"k"000000004444447C4444 

l"l"0OOOOCOlO17F 09091151 
l"*M000000007F 1414  141414 
l"n" 000QG002023EQ23EQ202 
l"o"0000000111 1111 7F0101 
1"d"303000 0404 2427 7C0404 
l“q“00404040404C4C40407F 
l"r"0ClCJ37FGC  1C2241221C 
l"s"0C01 020408 142241 4141 
l“t"0C7F  24  2424  34  4A-.94949 
1"u"000000003C141414147F 
1*v"0077444444-»444444477 
1*hi"00414141417F4141417F 
1"x"001C224141414141221C 
l"y"00000040407C4C7C404C 
l*j"0Q7F01C'lCl  7F4040407F 
1"E"00003Q1Q30501010107C 
1"C"00003C7F02049E01413E 
1 « >"00000004 OC  1424  7F 04 04 
1-p"OOOOOG3C42313E304C7E 
1 "I"000000  1C  224  05E  61 21  IE 
1"U"0C000C7?403E61 0141 3E 
1 '"'0C0C22  14  7F  1422  ; 

1 " *M30 1 8 l 30C0C  1  3  le  : 

1 "-"OOOQOO  7F  ; 
l":"000003367F0808  1 
1 "<"0001  06  1860  1  30601  ; 
l"=M000CCC000O0C081O; 
1">"0C2018060106192C: 
l"?"0G3E4l J606J3000S; 

1 " JM00  3F  414  1  3F  414  1  3E  I 
1"F"00  3C  424  3  3C01^23C  ; 

1 "L"00  3 H 4444  3b4646  39  ; 
1"""0C1C22454951221C: 

1  "N" 00080333030800 08; 
1"3" CO  7F  01 0204031  C  20  I 
1"P"00616204  a  8102343: 
]"CM0C102C-040-C2010; 

1 "\"00960406 ; 
l "  0-0004  02  0  1  01  0  1  02  041 
1 "."009000  7F0C  7F; 
1"*"0C«»1-»1  =  Fh1435522; 
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IBM  2-EYTE  IHTEENAL  HANGUL  CODE 
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